This notebook is an exploration of land size distributions in
household survey data. I begin by assessing the data we have available,
looking specifically at number of samples, spatial coverage and
completeness. I then explore the data on land-size. I explain why land
size might be an interesting variable to explore. Finally I delve into
the distributions of land size in different locations.
Assessing Data Coverage and Completeness
We are interested in the distribution of farm characteristics at the
subnational level. First we need to check how much data we have in
different subnational areas, and whether we potentially have sufficient
data per area to plot a distribution.

For this study, we are interested in areas where we have household
survey data and external datasets (census, climate, and land coverage).
There are 9 countries which fulfill these criteria.
If we look at the number of surveys per country, there are three
candidates from the datasets which have helpful characteristics:
- They have a relatively large number of surveys (n>2000)
- They have surveys in a significant portion of their subnational
areas (>40%)
The three countries are Burkina Faso (West Africa), Rwanda (Central
Africa), and Tanzania (East Africa).



Why Land Size
In this study, we are interested in looking at Land Size. Land size
is an interesting variable to look at for smallholder farmers. It can
tell us a lot about other variables (some of which are more error prone,
or more difficult to measure e.g. income and food security).
If we plot land size directly against income and food security, it
can be difficult to see the relationship due to outliers and high
variation. However if we look at quantiles and distributions, we can
begin to see that there might be some sort of association. We see that
there might be some relationship between land size and total income,
food security, and livestock holdings.
Interestingly we see that the spread varies for different quantiles.
For example, for higher land sizes, we see a larger spread in income
values.



Why Map the Spread of Land Sizes
In mapping efforts, we often see researchers trying to map averages.
For example, in the Lowder
article, they mapped average farm size per subnational unit using census
information, and information on the total arable land.
In different areas however, there is a large variation in land sizes.
Here we see that we have a wide range of land size distributions, each
of which vary quite significantly by country.

If we were to use land size to prioritise development interventions,
it would be important to account for the characteristics of land size
distributions.
In the plots below, we can see there appears to be an association
between average (mean or median) land cultivated, and the spread of land
cultivated (sd or IQR). This relationship appears to be heteroskedastic.
For areas with larger average cultivated land, there is also a greater
spread (i.e. different households cultivating different land sizes).
We also see that for most subnational areas, Land Size distributions
are skewed, and in many of these areas the distributions are fat-tailed
(normal distribution has kurtosis of ~ 3).




What do these distributions this mean in practice. Lets say we are
using Land Size to target development interventions towards the poorest
of the poor.
We could target interventions based on mean land cultivated. i.e, we
make the assumption lower mean land cultivated means more people with
smaller farms. People with smaller farms have lower incomes or food
security status.
Let us see what this would look like with the data we have. If we
were to take a threshold value, say we are interested in households
farming less that 2 hectares. What we would see is that the spread can
vary quite significantly, depending on the location we are looking at.
For example, in areas where the mean land cultivated is about 1.8ha, the
proportion of households cultivating less than 2ha can vary from about
35% to 100%.
In large scale surveys, it is much more difficult to see this type of
small-scale variation, as often only a few households are sampled in
these subnational areas.


Covariates
It is clear from an initial exploration that there is more going on
at the subnational level than variation in the mean. In this study we
want to see if we can try to predict this this variation.
To predict characteristics of subnational heterogneity, we need data
at the subnational level.
For this study we have aggregated census data (GEO2 level), climate
data, and land-cover information. Here are the variables we have
We don’t have all of these variables for all of these subnational
levels. So let’s see what we do have
From this assessment of data completeness, it seems that covariates
are not available in the Nigeria data. There is one region in Kenya
where covariates are not available.
There is missing data on Agricultural Employment in Nigeria, Kenya,
Burkina Faso, and Ethiopia. Missing data on “telephone” in Nigeria,
Kenya, and Mali. Missing data on “toilet” in Kenya and Nigeria.
Modelling Location Scale and Shape
In distributional models,
Notes
This exploration is a step towards developing procedures to map
smallholder heterogeneity. I do not want to imply that mapping land size
is the only thing we need to do for targetting the “poorest of the
poor”.